Search CORE

723 research outputs found

Trident: Controlling Side Effects in Automated Program Repair

Author: Barr ET
Mechtaev S
Parasaram N
Publication venue
Publication date: 13/11/2021
Field of study

The goal of program repair is to eliminate a bug in a given program by automatically modifying its source code. The majority of real-world software is written in imperative programming languages. Each function or expression in imperative code may have side effects, observable effects beyond returning a value. Existing program repair approaches have a limited ability to handle side effects. Previous test-driven semantic repair approaches only synthesise patches without side effects. Heuristic repair approaches generate patches with side effects only if suitable code fragments exist in the program or a database of repair patterns, or can be derived from training data. This work introduces Trident, the first test-driven program repair approach that synthesizes patches with side effects without relying on the plastic surgery hypothesis, a database of patterns, or training data. Trident relies on an interplay of several parts. First, it infers a specification for synthesising side-effected patches using symbolic execution with a custom state merging strategy that alleviates path explosion due to side effects. Second, it uses a novel component-based patch synthesis approach that supports lvalues, values that appear on the left-hand sides of assignments. In an evaluation on open-source projects, Trident successfully repaired 6 out of 10 real bugs that require insertion of new code with side effects, which previous techniques do not therefore repair. Evaluated on the ManyBugs benchmark, Trident successfully repaired two new bugs that previous approaches could not. Adding patches with side effects to the search space can exacerbate test-overfitting. We experimentally demonstrate that the simple heuristic of preferring patches with the fewest side effects alleviates the problem. An evaluation on a large number of smaller programs shows that this strategy reduces test-overfitting caused by side-effects, increasing the rate of correct patches from 33.3% to 58.3%

UCL Discovery

Approximate Oracles and Synergy in Software Energy Search Spaces

Author: Barr ET
Bruce BR
Harman M
Petke J
Publication venue
Publication date: 01/11/2019
Field of study

Reducing the energy consumption of software systems though optimisations techniques such as genetic improvement is gaining interest. However, efficient and effective improvement of software systems requires a better understanding of the code-change search space. One important choice practitioners have is whether to preserve the system & #x0027;s original output or permit approximation with each scenario having its own search space characteristics. When output preservation is a hard constraint, we report that the maximum energy reduction achievable by the modification operators is 2.69% (0.76% on average). By contrast, this figure increases dramatically to 95.60% (33.90% on average) when approximation is permitted, indicating the critical importance of approximate output quality assessment for code optimisation. We investigate synergy, a phenomenon that occurs when simultaneously applied source code modifications produce an effect greater than their individual sum. Our results reveal that 12.0% of all joint code modifications produced such a synergistic effect though 38.5% produce an antagonistic interaction in which simultaneously applied modifications are less effective than when applied individually. This highlights the need for more advanced search-based approaches

UCL Discovery

Automated transplantation of call graph and layout features into Kate

Author: Barr ET
Harman M
Jia Y
Marginean A
Publication venue: Springer International Publishing
Publication date: 28/07/2015
Field of study

We report the automated transplantation of two features currently missing from Kate: call graph generation and automatic layout for C programs, which have been requested by users on the Kate development forum. Our approach uses a lightweight annotation system with Search Based techniques augmented by static analysis for automated transplantation. The results are promising: on average, our tool requires 101 min of standard desktop machine time to transplant the call graph feature, and 31 min to transplant the layout feature. We repeated each experiment 20 times and validated the resulting transplants using unit, regression and acceptance test suites. In 34 of 40 experiments conducted our search-based autotransplantation tool, μSCALPEL, was able to successfully transplant the new functionality, passing all tests

UCL Discovery

CodeGrid: A Grid Representation of Code

Author: Barr ET
Bissyandé TF
Kaboré AK
Klein J
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 13/07/2023
Field of study

Code representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score

UCL Discovery

Flexeme: Untangling Commits Using Lexical Flows

Author: Allamanis M
Barr ET
Dash SK
Pârtachi PP
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/11/2020
Field of study

UCL Discovery

Today was a Good Day: The Daily Life of Software Developers

Author: Barr ET
Bird C
Meyer A
Zimmermann T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

What is a good workday for a software developer? What is a typical workday? We seek to answer these two questions to learn how to make good days typical. Concretely, answering these questions will help to optimize development processes and select tools that increase job satisfaction and productivity. Our work adds to a large body of research on how software developers spend their time. We report the results from 5971 responses of professional developers at Microsoft, who reflected about what made their workdays good and typical, and self-reported about how they spent their time on various activities at work. We developed conceptual frameworks to help define and characterize developer workdays from two new perspectives: good and typical. Our analysis confirms some findings in previous work, including the fact that developers actually spend little time on development and developers' aversion for meetings and interruptions. It also discovered new findings, such as that only 1.7% of survey responses mentioned emails as a reason for a bad workday, and that meetings and interruptions are only unproductive during development phases; during phases of planning, specification and release, they are common and constructive. One key finding is the importance of agency, developers' control over their workday and whether it goes as planned or is disrupted by external factors. We present actionable recommendations for researchers and managers to prioritize process and tool improvements that make good workdays typical. For instance, in light of our finding on the importance of agency, we recommend that, where possible, managers empower developers to choose their tools and tasks

UCL Discovery

Game-theoretic analysis of development practices: Challenges and opportunities

Author: Barr ET
Gavidia-Calderon C
Harman M
Sarro F
Publication venue
Publication date: 01/01/2020
Field of study

Developers continuously invent new practices, usually grounded in hard-won experience, not theory. Game theory studies cooperation and conflict; its use will speed the development of effective processes. A survey of game theory in software engineering finds highly idealised models that are rarely based on process data. This is because software processes are hard to analyse using traditional game theory since they generate huge game models. We are the first to show how to use game abstractions, developed in artificial intelligence, to produce tractable game-theoretic models of software practices. We present Game-Theoretic Process Improvement (GTPI), built on top of empirical game-theoretic analysis. Some teams fall into the habit of preferring “quick-and-dirty” code to slow-to-write, careful code, incurring technical debt. We showcase GTPI’s ability to diagnose and improve such a development process. Using GTPI, we discover a lightweight intervention that incentivises developers to write careful code: add a singlecode reviewer who needs to catch only 25% of kludges. This 25% accuracy is key; it means that a reviewer does not need to examine each commit in depth, making this process intervention cost-effective

UCL Discovery

The Oracle Problem in Software Testing: A Survey

Author: Barr ET
Harman M
McMinn P
Shahbaz M
Yoo S
Publication venue
Publication date: 20/11/2014
Field of study

Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice

CiteSeerX

UCL Discovery

On the naturalness of software

Author: Barr ET
Devanbu P
Gabel M
Hindle A
Su Z
Publication venue
Publication date: 01/05/2016
Field of study

Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations---and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether (a) code can be usefully modeled by statistical language models and (b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very regular, and, in fact, even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse's completion capability. We conclude the paper by laying out a vision for future research in this area

UCL Discovery

June: A Type Testability Transformation for Improved ATG Performance

Author: Barr ET
Bruce D
Clark D
Kelly D
Menendez H
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 12/07/2023
Field of study

Strings are universal containers: they are flexible to use, abundant in code, and difficult to test. String-controlled programs are programs that make branching decisions based on string input. Automatically generating valid test inputs for these programs considering only character sequences rather than any underlying string-encoded structures, can be prohibitively expensive. We present June, a tool that enables Java developers to expose any present latent string structure to test generation tools. June is an annotation-driven testability transformation and an extensible library, JuneLib, of structured string definitions. The core JuneLib definitions are empirically derived and provide templates for all structured strings in our test set. June takes lightly annotated source code and injects code that permits an automated test generator (ATG) to focus on the creation of mutable substrings inside a structured string. Using June costs the developer little, with an average of 2.1 annotations per string-controlled class. June uses standard Java build tools and therefore deploys seamlessly within a Java project. By feeding string structure information to an ATG tool, June dramatically reduces wasted effort; branches are effortlessly covered that would otherwise be extremely difficult, or impossible, to cover. This waste reduction both increases and speeds coverage. EvoSuite, for example, achieves the same coverage on June-ed classes in 1 minute, on average, as it does in 9 minutes on the un-June-ed class. These gains increase over time. On our corpus, June-ing a program compresses 24 hours of execution time into ca. 2 hours. We show that many ATG tools can reuse the same June-ed code: a few June annotations, a one-off cost, benefit many different testing regimes

UCL Discovery